Basic Git Version Control
Git is a distributed version control system which tracks changes in a set of files. Its goals include speed and efficiency, data integrity, and support for non-linear workflows. It was originally created by Linux Torvalds in 2005 for the development of the Linux Kernel. As a distributed version control system, every directory using Git is a full-fledged repository, with complete history and full version-tracking abilities, which is independent of network access or a central server. These notes rely on the ideas and learnings from the Git documentation and "Pro Git", 2nd Edition, by Scott Chacon, Ben Straub, and various other contributors. Git is free and open-source software distributed under the GPLv2.0-only licence.
Initial Background
The main difference between a distributed version control and centralized version control is that a distributed version control locally mirrors the entire repository, including its complete history, for each client, rather than only locally checking out the latest working copy or snapshot while the rest of the repository remains on a centralized server. A unique attribute of Git is that it considers files within a repository as snapshots of a miniature file system for each version - in other words, whenever a commit is made, a snapshot of all files at that moment is efficiently referenced (along with the creation date, commit details, and commit message) and stored with pointers to the previous parents of the file through checksums (using SHA-1 hashes referencing blobs). This is in contrast to most alternative version control options which focus on storing changes made to files over time (known as delta-based version control). As a result, most internal operations in Git only need local files and resources to operate without interacting with a network.
Because of this, an attribute which makes Git stand out from alternative version control options is its branching model. This model allows and encourages clients to have multiple local branches which can be entirely independent of each other, while then allowing for simple creation, merging, deletion, and switching between the different branches. These branches can also be used based on individual roles and features as well, such as for development, experimentation, testing, and production. Moreover, it is possible to segment development locally, selectively push individual branches to the remote repository, and adapt based on the workflow of the project. In this way, each local repository is also a complete clone of the entire repository, even if only a specific branch is checked out, which allows for each client to have a full backup of the project (which can be easily restored if the main server fails or becomes corrupted).
Relative to Git, there are 3 main states in which a file can reside. These modified for files which have only been changed without other operations (associated with a working tree), staged for files which have been changed and marked to be included in the next commit (associated with a staging area or index), and committed for files which have been committed to the local or remote repository (associated with the .git
directory with the metadata and object database for the repository). Once committed, the files are seen as unmodified until they are changed in a future commit. A file may also be categorized as untracked when it is first created and if it is ignored by Git.
When setting up Git, there are various configuration variables stored in a configuration file which control the operational aspects. The location of the configuration file is at project/.git/config
for a project (default behaviour), ~/.gitconfig
or ~/.config/git/config
for the global user, or /etc/gitconfig
for the system (requires administrative or superuser privilege to change). The main configuration variables include a username and email address as an identity which is immutably baked into commits. Other configuration variables which can be queried, set, replaced, or unset include the default text editor, name of the default branch upon initialization, pruning behaviour when fetching, template for commit messages, set of files to ignore, response time for auto-correction, and aliases for a custom command.
~ $ git [--version] [--help] [-C <path>] [-c <name>=<value>] [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path] [--paginate|--no-pager] [--no-replace-objects] [--bare] [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>] [--super-prefix=<path>] [--config-env=<name>=<envvar>] <command> [<args>]
~ $ git help --all
~ $ git config --list
~ $ git config --list --local
~ $ git config --list --global
~ $ git config --list --system
~ $ git config --global user.name "First Last"
~ $ git config --global user.email "firstlast@mail.com"
~ $ git config --global --edit
~ $ git config --global core.editor "nano -w"
~ $ git config --global init.defaultBranch "main"
~ $ git config --global pull.rebase "false"
~ $ git config --global fetch.prune true
~ $ git config --global commit.template ~/.gitmessage.txt
~ $ git config --global core.excludesfile ~/.gitignore.txt
~ $ git config --global help.autocorrect 1
~/project-directory $ git config --global alias.unstage "reset HEAD --"
~/project-directory $ git config --global alias.empty "git commit --allow-empty"
~/project-directory $ git config --global alias.last "log -1 HEAD"
~/project-directory $ git config --global alias.visual "!gitk"
Repository Setup
A new local repository can be created at the current directory or another specific target directory. The creation of a repository involves the initialization of the necessary configuration files and skeleton for the repository under .git
. Alternatively, it is possible to clone an existing remote repository through Hypertext Transfer Protocol or Secure Shell Protocol, where this repository will already have existing configuration files and skeleton. Once the repository has been initialized, a working copy will be automatically checked out from the initial branch and any files within the repository will be tracked by Git and versioned based on the staging and commits.
~ $ git init
~ $ git init ./project
~ $ git clone https://example.com/project.git
~ $ git clone ssh://example.com/project.git
~ $ git clone https://example.com/project.git ./project-directory
~ $ git clone https://example.com/project.git --origin nameRemote
~ $ git clone https://example.com/project.git --branch nameBranch
It is good practice to commit snapshots of changes each time the branch reaches a suitable position to be recorded. Once a suitable position is reached, the relevant files and their changes are added to the staging area and, once all modified and new files are staged, the commit is made to repository with a message describing the modifications and additions. For reference when staging files, it is possible to check the state of files which have changed since the last commit and any new files in the working tree, as well as the files which are currently added to the staging area for the next commit. The exact changes in modified files can also be compared with the differences and associated lines numbers being detailed.
It should be noted that a new file will initially be untracked when it is created - specifically any files in the working tree are untracked if they were not in previous snapshots and are not in the staging area. Any changes made to untracked files are not recorded as changes, such that it is necessary for them to be included in a commit before changes are tracked. Once a file is committed for the first time, it will be tracked and any subsequent changes will be noticed. It is also possible to set a specific file or directory, types of files, or standard glob patterns (simplified regular expressions) to be intentionally ignored from being tracked within .gitignore
in the root of the repository applied recursively to the entire repository (it is also allowable to have a .gitignore
in sub-directories applied relative to the path of the sub-directory). (With regard to a short status, ??
refers to new files which are not tracked, A
refers to new files which have been added to the staging area, M
refers to modified files which have been staged, and MM
refers to modified files which have been staged).
~/project-directory $ git status
~/project-directory $ git status --short
~/project-directory $ git add "Example.txt"
~/project-directory $ git add "./directory/example"
~/project-directory $ git add "."
~/project-directory $ git diff
~/project-directory $ git diff --staged
~/project-directory $ git difftool
.gitignore
and are not tracked:# Comment *.log [ABC].pt [0-9].md Directory/ /TODO.txt !Extra/**/*.log
Once the files are staged with acceptable changes, they can be committed to create a snapshot of the repository with a SHA-1 hash as a checksum for reference. In aggregated, all the commits form the history logs of the repository and they should be documented with descriptive messages. To mention, it is good practice for the message of the commit to generally describe what the effect of the changes would be if they were to be merged into another branch. There are also available commands to prepare files for being committed directly, as opposed to using system commands from the command line and then staging the modifications afterwards (such as removing, moving, and renaming).
~/project-directory $ git commit
~/project-directory $ git commit --message "Description of the effect of the changes."
~/project-directory $ git commit --all --message "Description of the effect of the changes."
~/project-directory $ git log -3
~/project-directory $ git log --patch
~/project-directory $ git log --stat
~/project-directory $ git log --pretty=oneline --graph
~/project-directory $ git log --pretty=format:"%h - %an, %ar: %s"
~/project-directory $ git log --grep="Search in commit messages."
~/project-directory $ git rm "Example.txt"
~/project-directory $ git mv "Directory 1/Example.txt" "Directory 2/Example.txt"
A commit can be amended by making and staging the changes to create a new commit which replaces the results of the previous commit, such that there will be no record of the previous commit in the repository. In most cases, it is good practice to only amend commits which are only in the local repository and have not been pushed to the remote repository (otherwise it will be necessary to force the push which can cause issues for other clients). The contents of a file can also be restored to unmodify a modified file or unstage a staged file. If there has been an error with staging or incorrect commits, the local repository can also be reset to a previous commit with changes either being kept or discard relative to the working tree and staging area. Similarly, it is possible to revert a modified file to its contents in a previous commit or in the staging area. However, it should be kept in mind that caution should always be applied when undoing changes, as it is not always possible to easily redo something which has been mistakenly undone.
~/project-directory $ git commit --amend --message "Description of the effect of the changes."
~/project-directory $ git restore --source main~3 "Example.txt"
~/project-directory $ git restore --staged "Example.txt"
~/project-directory $ git restore --staged --worktree "Example.txt"
~/project-directory $ git reset "Example.txt"
~/project-directory $ git reset --merge HASH123
~/project-directory $ git reset --soft HEAD~3
~/project-directory $ git reset --mixed HEAD~3
~/project-directory $ git reset --hard nameBranch
~/project-directory $ git checkout --force nameBranch
The default remote repository will be implicitly added with the shortname as origin
when the repository is initialized or cloned. Additional remote repositories can be added explicitly when working with several collaborators. To synchronize local changes on a branch with a remote repository, it is necessary to push the commits of the changes to the specified remote repository (merges local states to remote states, where fast-forward is used by default if possible). To check whether there have been changes to remote repositories, it is necessary to fetch the latest information of the remote repositories which is not yet in the local repository, along with the objects necessary to complete the histories of the updated branches. To synchronize remote changes on a branch with a local repository, it is necessary to pull the commits of the changes from the specified remote repository (merges remote states to local states, where fast-forward is used by default if possible). If the local and remote branches have diverged, it will be necessary to specify how to reconcile the divergent branches. It is also necessary to have read and write access to the remote repository.
~/project-directory $ git remote --verbose
~/project-directory $ git remote show nameRemote
~/project-directory $ git remote add nameRemote git://example.com/repository.git
~/project-directory $ git remote rename nameRemoteOld nameRemoteNew
~/project-directory $ git remote prune nameRemote
~/project-directory $ git remote remove nameRemote
~/project-directory $ git push --all
~/project-directory $ git push nameRemote nameBranch
~/project-directory $ git push nameRemote nameBranch --force
~/project-directory $ git fetch --all
~/project-directory $ git fetch nameRemote nameBranch
~/project-directory $ git fetch --multiple nameRemote nameRemote nameRemote
~/project-directory $ git fetch --prune nameRemote
~/project-directory $ git fetch nameRemote --force
~/project-directory $ git pull --all
~/project-directory $ git pull nameRemote nameBranch
~/project-directory $ git pull nameRemote nameBranch --force
~/project-directory $ git fetch --all ~/project-directory $ git merge nameRemote/nameBranch
Tagging provides the ability to mark specific points in the history of a repository - typically for version release points. There is support for lightweight and annotated tags. A lightweight tag is a pointer to a specific commit and meant for private or temporary object labels (in a sense, this can be thought of as a branch which does not changes). An annotated tag is stored as an object with a checksum, creation date, tagger details, tagging message, and optional signature for verification. A tag can also be set retrospectively to a specific commit with its checksum. To note, as with branches, it is also necessary to push the tags to the specified remote repository (merges local states to remote states).
~/project-directory $ git tag --list --verbose "*"
~/project-directory $ git show nameTag
~/project-directory $ git tag nameTag
~/project-directory $ git tag --annotate nameTag --message "This is a new version for release."
~/project-directory $ git tag --annotate nameTag --message "This is a signed version." --signed
~/project-directory $ git tag --annotate nameTag HASH123
~/project-directory $ git tag --delete nameTag ~/project-directory $ git push nameRemote --delete nameTag
~/project-directory $ git tag --delete nameTag ~/project-directory $ git push nameRemote :refs/tags/nameTag
~/project-directory $ git push nameRemote nameTag
~/project-directory $ git push nameRemote --tags
~/project-directory $ git push nameRemote --tags --follow-tags
Branching And Merging
Although alluded to, branching involves creating an alternate and independent line which diverges from the main line of development. This functionality is intuitive in Git, as it is exceptionally lightweight for managing and switching between different branches. This is possible due to the nature in which snapshots are used to store commits and changes to files through blobs (content-addressable filesystem as a key-value data store for the versions of the files), trees (collection of checksums for the content matching the generated blobs with their respective paths and allowing for the re-creation of a file at any point), and index (list of the resources needed to create the full tree of directories and files which are used to form the next commit). So, a branch is simply a movable pointer to a specific commit (using SHA-1 hashes). The default branch name is master
(although this is often renamed to main
in newer projects).
A new branch can be created which is essentially associated with a new movable pointer to a specific commit. It is possible to switch to a different branch by checking it out to be the current branch. To distinguish the current branch, there is a special reference as HEAD
pointing to the local branch which is currently checked out (in a detached state, the HEAD
does not point to any branch, but instead points to a specific commit or remote repository). It should be emphasized that the files of the project will always reflect their state as modified from the latest commit of the current branch. For organization, it can be convenient to name branches with a prefix in the form of prefix/shortname
for a hierarchical scheme, where the prefix refers to the type, topic, developer, or team for which the branch is intended to be used.
~/project-directory $ git branch --list --verbose "*"
~/project-directory $ git show nameBranch
~/project-directory $ git branch nameBranch
~/project-directory $ git branch nameBranch startPoint
~/project-directory $ git branch --move nameBranchOld nameBranchNew ~/project-directory $ git push --set-upstream nameRemote nameBranchNew ~/project-directory $ git push nameRemote --delete shownameOld
~/project-directory $ git branch --delete nameBranch ~/project-directory $ git push nameRemote --delete nameBranch
~/project-directory $ git branch --delete nameBranch ~/project-directory $ git push nameRemote :nameBranch
~/project-directory $ git checkout nameBranch
~/project-directory $ git checkout -b nameBranch
~/project-directory $ git checkout -b nameBranch startPoint
~/project-directory $ git switch nameBranch
~/project-directory $ git switch --create nameBranch
~/project-directory $ git switch --create nameBranch startPoint
~/project-directory $ git checkout --track nameRemote/nameBranch
~/project-directory $ git checkout --set-upstream-to nameRemote/nameBranch
Merging considers the current branch and another branch forming an independent line of development and integrate the alternate branch into the current branch. The sequence of commits will be considered from the point at which their histories diverged and these will be combined into a unified history. A fast-forward is possible when commits are merged into a branch which can be reached by linearly following the history of the commits, such that it is possible to simply move the pointer of the branch forward, because there are no divergent changes to take into account (such that it is not necessary to make a commit). If a fast-forward is not possible (divergent changes within the branches), a 3-way merge (or true merge) will be performed using 2 snapshots at the tips of the branches and common ancestor of the branches. During the process, a 3-way merge will create a merge commit and is special in that it has more than 1 parent.
~/project-directory $ git merge nameRemote/nameBranch
~/project-directory $ git merge -m "Merge into 'main'. Add details." nameRemote/nameBranch
~/project-directory $ git merge --edit nameRemote/nameBranch
~/project-directory $ git merge --no-commit nameRemote/nameBranch
~/project-directory $ git merge --no-ff nameRemote/nameBranch
With a 3-way merge, the common ancestor is used as a base and serves as a reference upon which more complex logic can be performed. The logic is tasked with using algorithms to determine whether the separate versions of files differ in ways which are irreconcilable. If they cannot be reconciled due to different changes to the same parts of the file, a merge conflict or multiple merge conflicts are logged, which will prevent the merge from being completed until the issue has been solved (pause after staging but before the merge is committed). A merge conflict can be resolved through manual intervention in a similar method of modifying, staging, and committing files, as the merge conflict will be marked with <<<<<<<<
(current branch), =======
, and >>>>>>>>
(source branch) in the respective file. This involves performing modifications to resolve the merge conflict (remove markers and choose the content of either the current branch, source branch, or something else), staging the changes as part of the commit, and then completing the merge commit. Because of the possibility of merge conflicts, starting a merge with non-trivial uncommitted changes should be discouraged.
~/project-directory $ git merge nameRemote/nameBranch ~/project-directory $ git status ~/project-directory $ git mergetool ~/project-directory $ git merge --continue
~/project-directory $ git merge --abort
The strategy or method for the logic used in the merge can also be specified. The default option is ort
(Ostensibly Recursive's Twin), while other options include recursive
, resolve
, octopus
, ours
, and subtree
. As an alternate to a 3-way merge, it is possible to combine a rebase with a fast-forward merge. A rebase involves taking the changes which were introduced and committed on the divergent branch and re-applying them on top of the current branch. This operation works by going to the common ancestor of the 2 branches, getting the changes introduced by each commit of the current branch, saving those changes to temporary files, resetting the current branch to the same commit as the source branch, and then applying each change in turn from the temporary files. There is no difference in the final result of the integration between following a merge or rebase, but rebasing usually leads to a cleaner history which looks linear - if examining the log of a rebased branch, it appears as if the commits happened in series, even though they may have originally happened in parallel (commonly used in projects with many contributors, such that integration is simple for maintainers). The alternative argument is that the history should be an accurate record and should not be changed.
With a more complex structure, it is possible to transplant a sub-branch which has been split from a parent branch into another branch with a distance common ancestor, such that the result can be viewed as pretending that the sub-branch was originally split from the other branch. The strategy or method for the logic used in the rebase can also be specified. The default option is ort
(Ostensibly Recursive's Twin), while other options include recursive
, resolve
, octopus
, ours
, and subtree
. An important consideration to emphasize is that a branch should not be rebased if it is used by other collaborators, as a rebase will result in the associated commits being abandoned and re-applied which may misalign with work done by collaborators (especially when commands are forced).
~/project-directory $ git checkout nameBranchFeature ~/project-directory $ git rebase nameBranchMain ~/project-directory $ git switch nameBranchMain ~/project-directory $ git merge nameBranchFeature
~/project-directory $ git rebase --onto nameBranchMain nameBranchParent nameBranchCurrent
~/project-directory $ git rebase nameBranchMain ~/project-directory $ git status ~/project-directory $ git mergetool ~/project-directory $ git rebase --continue
~/project-directory $ git rebase --abort
Graphical User Interfaces
There are several graphical user interfaces which act as clients to allow for intuitive interaction with a repository. The most popular clients include Sourcetree (although it is proprietary and only available on Windows and Mac), Git Extensions (although it is only available on Windows), Gitnuro, GitFiend, MeGit, Gittyup, GitQlient, Kommit, Gitg, and Giggle. The basic functionality and operations will be available in most clients (such as fetching, pushing, tagging, branching, and merging), although it is not possible to map all of the commands and every option. The most convenient use may be as a viewer, where it is simple to see past commits and complete history of the project, and for choosing which files are to be staged for the next commit, where changes or partial changes can be selectively picked.